HeterogeneousAdoptionDiD methodology-review-tracker promotion: In Progress -> Complete#473
Conversation
Add tests/test_methodology_had.py (6 classes, 34 tests) with paper-
equation-numbered Verified Components walk-through against de
Chaisemartin, Ciccia, D'Haultfoeuille & Knau (2026) arXiv:2405.04465v6
covering Equations 3 / 7 / 11 / 18 / 29 and Theorems 1 / 3 / 4 / 7:
- TestHADTheorem1Design1Prime: Eq. 3 Design 1' WAS recovery + N(0,1)
coverage check at n_replicates=200, G=1000 with KS-stat <= 0.05 and
empirical 95% coverage >= 0.90
- TestHADTheorem3MassPoint: Eq. 11 / Theorem 3 mass-point WAS_{d_lower}
recovery + Wald-IV closed-form equivalence at atol=1e-9
- TestHADTheorem4QUG: Theorem 4 limit-law distributional match against
closed-form F(t) = t/(1+t) at KS-stat <= 0.05, n_draws=5000, G=2000
- TestHADTheorem7YatchewHR: Eq. 29 standard-normal limit, paper-literal
sigma2_diff = 1/(2G) normalization lock
- TestHADJointStute: Section 4.2 step 2 + 4.3 mean-independence variant
H0 fail-to-reject + H1 reject under nonlinear DGP
- TestHADDeviations: equal-weighting invariance, sup-t bootstrap gating,
staggered-timing fail-closed ValueError, safe_inference joint NaN
Add Assumption 5/6 non-testability documentation:
- HeterogeneousAdoptionDiD class docstring: new "Non-testable assumptions
(paper Section 3.1.2)" Notes block citing Section 3.1.2 + cross-
referencing the existing fit-time UserWarning at had.py:3372-3390
- qug_test / stute_test / yatchew_hr_test / did_had_pretest_workflow:
"Scope (what this test does NOT cover)" clauses in Notes sections
explicitly stating tests verify ADJACENT assumptions (4 / 7 / 8) and
CANNOT test Assumptions 5 or 6
Close paper-review checklist L182-L194 + REGISTRY HAD Implementation
Checklist L2602-L2604: Phase 1a/1b/1c implementation closures (panel
validator, design paths, local-linear backend, bias-corrected CI),
staggered-timing fail-closed ValueError, zero-dose UserWarning filter,
Assumption 5/6 non-testability documentation. L2604 (covariates=
Theorem 6 NotImplementedError) remains [ ] with explicit TODO.md
cross-reference (currently a Python TypeError, fail-closed).
Waive Phase-4 validation-harness items #1 (Pierce-Schott 2016 Figure 2)
+ #2 (Table 1 coverage rates) with documented rationale: R parity at
atol=1e-8 in test_did_had_parity.py (3 DGPs x 5 method combos, bit-exact
via rtol=0) is a strictly stronger correctness anchor than coverage-rate
MC. Paper Section 5.2 itself self-acknowledges NP estimators too noisy
to be informative on the LBD-restricted PNTR panel.
REGISTRY HAD section gains a consolidated Deviations block (5 entries
with framing header distinguishing Notes #1-#2 = implementation choices
from Notes #3-#4 = waived validation-harness work from #5 = Library
extension for staggered-timing fail-closed). Existing scattered Note
entries at L2313 (equal-weighting) and L2398 (sup-t gating) referenced
from the new block.
METHODOLOGY_REVIEW.md HAD row promoted In Progress -> Complete, detail
section rewritten with Verified Components / Test Coverage / Corrections
Made / Deviations / Outstanding Concerns structure mirroring the Bacon /
TripleDifference Complete-row layout.
TODO.md: existing Phase 4 Pierce-Schott row annotated with the 2026-05-20
waiver decision + rationale; new follow-up row for covariates= Theorem 6
NotImplementedError +Theorem 6 pointer (Low priority).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…test - P2 (Maintainability): fit() docstring on first_treat_col and aggregate="event_study" conflated two staggered-timing branches. Now explicitly documents both: supplied → auto-filter + UserWarning; omitted → fail-closed ValueError + DCDH redirect. Keeps Appendix B.2 wording aligned with the REGISTRY Library extension #5 note. - P2 (Documentation/Tests): rebuilt the equal-weighting deviation test. Old test duplicated the entire panel uniformly — invariant under both equal and cell-size weighting. New test (test_equal_weighting_is_per_row_not_per_dose_cell) replicates only low-D units (D <= 0.15) 4x on a nonlinear DGP (delta_Y = 0.5*D + 1.0*D²) and asserts the att shifts by > 1.5*max(se) AND moves downward. Per-row equal weighting predicts the shift; cell-size weighting (counterfactual) would predict att invariant. - P2 (Methodology): downgraded the paper-review L191 closure note ("Warnings for extensive-margin effects"). Original text overclaimed REGISTRY had a "suggests running existing DiD" recommendation that does not exist. Now describes the actual library state: qug_test surfaces zero-dose UserWarning; explicit main-path "fall back to DiD" recommendation is a Low-priority follow-up. - P3 (line refs): swapped hard-coded "had.py:3372-3390" references to a search string ("---- Assumption 5/6 warning on Design 1 paths ----") so they survive future docstring edits. 3 surfaces updated: METHODOLOGY_REVIEW, REGISTRY, paper review. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- P1 (Methodology): Eq. 3 / Theorem 1 was previously written as the simplified
WAS = E[ΔY] / E[D] in test docstring + METHODOLOGY_REVIEW.md. The paper and
the in-code HAD docs use the boundary-subtracted form WAS = [E(ΔY) - lim_{d↓0}
E(ΔY | D ≤ d)] / E(D); the library implements
att = (mean(ΔY) - τ_bc) / mean(D). Old DGP set τ_bc ~ 0 so the subtraction
term was untested. Fix:
- Restated Eq. 3 in test_methodology_had.py module + class docstrings,
METHODOLOGY_REVIEW.md, and REGISTRY Deviations Note #1.
- Added boundary_intercept kwarg to _make_two_period_panel so DGP can be
parameterized with delta_Y = c + β*D + ε (c != 0).
- New test_eq3_was_recovery_nonzero_boundary_intercept: c=0.2, β=0.3 →
att should recover 0.3 (not 0.7 = 0.35/0.5, the wrong-formula answer).
Test passes locally; explicit anti-guard against the no-subtraction
failure mode (abs(att - 0.7) > 5 * se).
- P3 (Maintainability): METHODOLOGY_REVIEW.md cited the fit-time UserWarning
as inside _fit_continuous / _fit_mass_point_2sls. Actual emission point is
the outer HeterogeneousAdoptionDiD.fit() dispatch (search anchor preserved).
Also updated the equal-weighting test reference to the new test name.
- P3 (Tech Debt): paper-review L191 (extensive-margin warning) was marked [x]
but described as partial / unimplemented. Flipped to [ ] with a status note
pointing to TODO.md; added a corresponding follow-up row in TODO.md for the
fit-time "consider running standard DiD" warning.
All 35 methodology tests pass; full HAD sweep clean (664 passed).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…im accuracy - Test file L42 class-structure bullet still summarized Theorem 1 as the simplified WAS = E[delta_Y] / E[D] shorthand. Rewritten to describe the boundary-subtracted identification + both DGP variants exercised. - paper-review L193 (multi-period event-study closure) still said staggered panels auto-filter to last cohort with UserWarning. Updated to align with L190 / the implementation: auto-filter only when first_treat_col supplied; ValueError when omitted. - METHODOLOGY_REVIEW.md test counts updated: 35 methodology tests (was 34; added test_eq3_was_recovery_nonzero_boundary_intercept in R2). T21 drift 17 (was 16); T22 drift 32 (was 28); T20 drift 14 (was unspecified). - CHANGELOG bullet reworded: was "closes the 3 unchecked Implementation Checklist items at L2684-L2686" which overclaimed. Now: "closes 2 of 3 (staggered fail-closed + Assumption 5/6 docs); covariates= Theorem 6 and extensive-margin warning explicitly tracked in TODO.md as follow-ups." Boundary-subtracted DGP variant explicitly named in the bullet. All 35 methodology tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…unts - P2 (Methodology): the new Scope notes claimed QUG "targets Assumption 4 boundary density". The paper's Assumption 4 is broader (positive boundary density + twice-differentiable conditional mean + continuous-positive conditional variance + bandwidth regularity). QUG / Theorem 4 actually tests only the support-infimum null d_lower = 0, which is one clause of Assumption 4. Reworded in 4 surfaces: qug_test Notes, did_had_pretest_workflow Notes, HeterogeneousAdoptionDiD class docstring, paper-review L192 closure. Now phrased as "QUG tests the Theorem 4 / Design 1' support-infimum null d_lower = 0 — adjacent evidence on the d_lower = 0 clause of Assumption 4 only, NOT a test of the full statement". - P3 (Documentation/Tests): T21/T22 drift-test counts fixed in the remaining stale references. METHODOLOGY_REVIEW.md "Verified Components" row updated to 17/32 (was 16/28) + 14 for T20. REGISTRY HAD §"Phase 5 wave 2 first slice" (PR #409) updated to 17 (was 16). The Test Coverage block (already at 17/32) and CHANGELOG (already accurate after R3) unchanged. All 35 methodology tests pass; lint clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…curacy
- P2 (Methodology): tightened stute_test / yatchew_hr_test / class docstring
to correctly attribute Assumption 7 (mean-independence pre-trends) to
joint_pretrends_test (intercept-only residual form via
null_form="mean_independence") rather than to the raw stute_test helper.
The raw stute_test always fits dy ~ 1 + d and tests Assumption 8 linearity.
Updated all 5 surfaces: stute_test Notes, yatchew_hr_test Notes (now also
documents null="linearity" vs null="mean_independence" kwarg correctly,
no longer references nonexistent "residual_form"), HeterogeneousAdoptionDiD
class docstring (split into 4 distinct ADJACENT condition bullets), REGISTRY
HAD checklist L2694 closure, paper-review L192 closure.
- P3 (Documentation/Tests): the new workflow / REGISTRY / paper-review prose
said the composite verdict surfaces the Assumption 5/6 caveat. Actually
the verdict string only flags the Assumption 7 step-2 gap on the
aggregate="overall" path. Reworded in 4 surfaces (workflow Notes, HAD class
docstring, REGISTRY L2694, paper-review L192) to clarify that the
Assumption 5/6 caveat is surfaced by (a) the Design 1 fit-time UserWarning
and (b) T21 tutorial prose — NOT by the workflow verdict string.
- P3 (Documentation/Tests): yatchew_hr_test Notes referenced a nonexistent
"residual_form" selector. Replaced with the correct kwarg name "null"
({"linearity", "mean_independence"}) and described both branches.
All 35 methodology tests pass; full HAD + drift sweep 665 passed; lint clean.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- P3 (Methodology): the promoted HAD materials described the Eq. 17/18 `trends_lin=True` linear-trend-detrended variant as "deferred per Phase 4". This conflated TWO different things: (a) the FEATURE — which is shipped via the `trends_lin: bool = False` keyword-only kwarg on HAD.fit(), joint_pretrends_test, and joint_homogeneity_test (PR #389; R-parity locked against DIDHAD::did_had(trends_lin=TRUE) v2.0.0 in test_did_had_parity.py); and (b) the PIERCE-SCHOTT NUMERICAL REPLICATION against the published p=0.51 anchor on the LBD-restricted panel, which IS waived per REGISTRY Deviations Note #3. Updated 3 surfaces (paper-review L194, METHODOLOGY_REVIEW Eq. 18 Verified-Components row, test_methodology_had.py module docstring + TestHADJointStute class docstring) to distinguish "feature shipped + R-parity locked elsewhere" from "Pierce-Schott numerical replication waived". - P3 (Documentation/Tests): TestHADJointStute promotion narrative overstated H1 coverage as "H0 fail-to-reject and H1 reject on linear vs nonlinear DGPs" for both joint_pretrends_test and joint_homogeneity_test. Reality: H1 rejection is tested only on joint_homogeneity_test via a quadratic post- DGP; joint_pretrends_test gets H0-only coverage in this file (H1 would require a violating-pretrends fixture that re-verifies bootstrap calibration covered by test_had_pretests.py). Narrowed wording in METHODOLOGY_REVIEW Verified-Components row + TestHADJointStute class docstring; CHANGELOG entry unchanged (the H1 reject claim in CHANGELOG explicitly cites the homogeneity side via "H1 reject under nonlinear DGP", which is accurate). All 35 methodology tests pass; lint clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…1 scope - R6 fix left METHODOLOGY_REVIEW.md Deviations item #6 stale (only updated the Verified-Components row). Item #6 still said "Eq. 18 linear-trend- detrended joint Stute deferred". Rewritten to match the rest of the HAD tracker: trends_lin=True is SHIPPED + R-parity-locked in test_did_had_parity.py; the methodology-walkthrough file deliberately doesn't duplicate that coverage; the Pierce-Schott published-value numerical replication is what's waived (Deviations Note #3). - R6 narrowed the Verified-Components row + class docstring but missed the CHANGELOG bullet, which still claimed "joint Stute pre-trends + homogeneity H0 fail-to-reject + H1 reject under nonlinear DGP". Narrowed to: "H0 fail-to-reject on both surfaces and H1 reject for joint homogeneity under a nonlinear DGP" — matches the test file's actual scope. All 35 methodology tests pass; lint clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- P3 (Maintainability): CHANGELOG hard-coded REGISTRY line references
L2684-L2686. Those lines shifted as we edited REGISTRY across rounds.
Replaced with stable item names ("staggered-timing fail-closed
ValueError" / "Assumption 5/6 non-testability documentation" /
"covariates= Theorem 6 follow-up").
- P3 (Documentation/Tests): two new methodology tests had docstrings
describing a stronger contract than they asserted.
- test_sup_t_bootstrap_skipped_when_cband_false: docstring said
"all-NaN", assertion was "is None". Aligned docstring to the
actual Optional[ndarray] None contract.
- test_safe_inference_joint_nan_on_degenerate_panel: docstring said
"all fields jointly NaN", assertion accepted either all-NaN OR
all-finite (the no-partial-NaN invariant). Renamed test to
test_safe_inference_no_partial_nan_on_degenerate_panel and
rewrote the docstring to match the actual invariant.
All 35 methodology tests pass; lint clean.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…g lock - P3 (Documentation/Tests): test_first_treat_col_activates_last_cohort_auto_filter only asserted n_units < G; that would still pass if never-treated controls were accidentally dropped and only the last cohort survived. Strengthened to exact-count assertion: with G=600 and 3 equal-sized cohorts (third=200 each), kept = 200 never-treated + 200 last-cohort = 400. Added a cross-check via the panel's first_treat value set + a kept/dropped count identity (kept + 200 dropped = G). - P3 (Documentation/Tests): the shared _fit_overall() helper suppressed the Design 1 Assumption 5/6 UserWarning with a comment claiming the warning was "covered by TestHADDeviations" — but no test in that class actually asserted the warning fires. Added test_assumption_5_6_userwarning_fires_on_design_1_family which uses pytest.warns(UserWarning, match=r"Assumption [56]") on a mass-point fit to lock the warning surface against silent regression. Also narrowed the helper's warning filter to the exact "Assumption [56]" pattern rather than the broad "(Assumption|continuous_near_d_lower|mass_point)" match — keeps test output clean without masking unrelated future warnings. Methodology test count is now 36 (was 35); CHANGELOG + METHODOLOGY_REVIEW counts updated. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
R9's strengthened test asserted the retained cohort set from the INPUT
panel ({0, 2, 3}), which is invariant to what the auto-filter actually
kept — the test would pass even if the estimator dropped the wrong
200 units. Switched to result.filter_info (the canonical source of
truth for the filter's kept/dropped metadata), asserting:
- result.filter_info["F_last"] == 3 (last cohort kept)
- result.filter_info["n_kept"] == 400 (200 never-treated + 200 last)
- result.filter_info["n_dropped"] == 200
- result.filter_info["dropped_cohorts"] == [2] (earlier cohort only)
This now genuinely locks the Appendix B.2 last-cohort + never-treated
contract against silent regression to {2, 3} or any other 400-unit
composition.
All 36 methodology tests pass; lint clean.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Overall Assessment Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
… + ci_params
CI codex flagged that three new Monte Carlo / asymptotic tests in
tests/test_methodology_had.py add fixed-cost MC to the always-on suite
without the @pytest.mark.slow / ci_params.bootstrap(...) gating used
elsewhere in the repo (test_had_mc.py L88-101, test_methodology_dcdh.py
L521-548). Concrete fix: mark slow + route through ci_params.
Gated 3 tests with @pytest.mark.slow + ci_params fixture:
- test_eq3_normal_pivot_coverage: 200 fits @ G=1000
-> ci_params.bootstrap(200, min_n=25); coverage floor 0.85 / 0.65
- test_theorem4_limit_law_distributional_match: 5000 QUG draws @ G=2000
-> ci_params.bootstrap(5000, min_n=200); KS-tol 0.05 / 0.15
- test_eq29_standard_normal_limit_under_linearity: 200 Yatchew draws @ G=2000
-> ci_params.bootstrap(200, min_n=25); KS-tol 0.10 / 0.35
n-conditional tolerance bands per feedback_bootstrap_drift_tests_need_
backend_tolerance: stricter at full n (matches the original pre-gating
test contract), looser at reduced n (covers MC variance with min_n
replicates).
Default suite (no -m '') now: 33 passed + 3 deselected. Slow mode
(-m '') still: 36 passed. METHODOLOGY_REVIEW updated to document the
gating.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
…gy-review-tracker-promotion # Conflicts: # CHANGELOG.md
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
CI codex flagged that METHODOLOGY_REVIEW.md's HAD row + field table say "no R reference" / "paper-direct implementation", but the rest of the PR now relies on chaisemartin::did_had R parity for the Complete designation. Inconsistent provenance for readers. - Summary row (L61): swap "(paper-direct; nprobust for bandwidth)" for "chaisemartin::did_had (Credible-Answers/did_had v2.0.0); nprobust for bandwidth" - Field table (L690): replace "None (paper-direct implementation)" with the explicit chaisemartin::did_had reference at atol=1e-8 R parity pin + nprobust auxiliary reference at atol=1e-14 machine precision. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
…ueue + example refs One informational P3 from CI codex R2 — METHODOLOGY_REVIEW.md still described ContinuousDiD as "In Progress" in two surrounding surfaces even after the status-table flip, creating conflicting status signals. Fixed both sites: 1. L27 explanatory paragraph: removed the ContinuousDiD example from the In Progress band's "has methodology file but no paper review" illustration (it's now Complete). 2. L1289-1292 Priority Order queue: removed entry #9 (ContinuousDiD) and renumbered the remaining queue. Retroactive fix per feedback_changelog_accuracy_fixes (CI review catching one factual error in the queue means scanning for the same mistake): PR #473 promoted HeterogeneousAdoptionDiD to Complete but left entry #6 (HAD) in the same In Progress queue. Removed HAD's entry too and renumbered, so the queue is now self-consistent with the status table for all Complete entries. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Closes the WooldridgeDiD (ETWFE) methodology-review-tracker promotion in METHODOLOGY_REVIEW.md (In Progress → Complete), following the primary-source review for Wooldridge (2025) merged in PR-A (#484). Adds two paper-driven implementation surfaces and extends R-parity goldens to the nonlinear paths. Implementation: - `aggregate(weights="cohort_share")` on WooldridgeDiDResults implements paper Eqs. 7.4 (simple-overall) and 7.6 (event-time, restricted to k>=0) cohort-share aggregation weights as an opt-in alternative to the default cell-count weighting (matching Stata `jwdid_estat`). Inference fields fail-closed to NaN with UserWarning per paper Section 7.5 conditional-on-shares semantics; raises on `survey_design` (design-consistent totals deferred); raises on `type ∈ {"group","calendar"}` (no paper closed-form); raises on bootstrap fits (no matching bootstrap variant). Closes TODO row 95. - `cohort_trends=True` on `WooldridgeDiD.__init__` adds linear `dg_i · t` cohort-specific trend interactions (paper Section 8 / Eq. 8.1) for the OLS path. Rejects on logit/poisson per paper Section 8 OLS scope; rejects on survey_design pending full-dummy/TSL validation; enforces per-cohort pre-period identification check (≥ 2 observed pre-periods per treated cohort). Auto-routes to full-dummy mode regardless of vcov_type. Closes the PR-A Requirements Checklist heterogeneous-trends gap. Tests: - `tests/test_methodology_wooldridge.py` extended with 6 paper-equation-numbered methodology classes (Theorem 3.1, Proposition 5.1, Section 6 event study, Section 7 aggregation paths, Section 8 heterogeneous trends, Section 10 unbalanced panels) + `TestW2025LibraryDeviations` consolidating 5 surviving deviations. Mirrors the HAD PR #473 precedent. - Two new R-parity surface classes (`TestWooldridgeParityRPoisson`, `TestWooldridgeParityRLogit`) lock the structural surface against R `etwfe(family=...)` log-link goldens. - 209 tests total (60 methodology + 149 R-parity + unit regressions). R Goldens: - `benchmarks/R/generate_wooldridge_golden.R` extended with Poisson + logit DGPs via R `etwfe`; augmented panel CSV retains the same seed-generated `y_pois` + `y_logit` columns for cross-language reproducibility. - `benchmarks/R/requirements.R` pins `etwfe >= 0.5.0`. Tracker promotion: - METHODOLOGY_REVIEW.md L52 status flip with merge date; detail section L583-605 rewritten to the Verified Components / Test Coverage / Corrections Made / Deviations / Outstanding Concerns template mirroring HAD / ContinuousDiD / DCDH. L27 example re-pointed; priority queue items #7-#10 renumbered to #6-#9. - REGISTRY.md `## WooldridgeDiD (ETWFE)` extended with `### Deviations from the paper / from R / library extensions` block consolidating 7 surviving deviations + opt-in notes for cohort_share + cohort_trends + survey rejection + bootstrap cohort_share rejection contracts. - CHANGELOG.md `[Unreleased]` `### Added` documents the new parameters, R-parity extension, and tracker flip. - `docs/methodology/papers/wooldridge-2025-review.md` Requirements Checklist + Gaps & Uncertainties items 1 + 11 marked `**Status:** Closed in PR-B`. - `docs/api/wooldridge_etwfe.rst` updated with weighting-scheme notes alongside the existing aggregation table. Second of two PRs for the WooldridgeDiD methodology-review-tracker promotion. PR-A merged at e416aed (#484). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…omplete) Flips the ContinuousDiD tracker row to **Complete** with full Verified Components / Test Coverage / Corrections Made / Deviations / Outstanding Concerns structure mirroring the HAD precedent (PR igerber#473). Consolidation only — no source code changes, no new tests, no new docstrings. - METHODOLOGY_REVIEW.md L59 row flipped In Progress -> Complete with Last Review 2026-05-20. L634-655 detail section rewritten with the five-block tracker template: 12 Verified Components rows backed by 15 methodology tests + 80 unit tests + R parity at relative tolerance on 6 benchmark configurations. - docs/methodology/REGISTRY.md ## ContinuousDiD gains a formal Deviations block (4 entries with framing header) before the Implementation Checklist: boundary-knots Deviation from R + three Phase 2 silent-failures audit fixes documented as library extensions with no R correspondence. Existing Edge Cases bullet and Note entries remain in place — Deviations is the canonical AI-review surface per CLAUDE.md "Documenting Deviations" labels. - CHANGELOG.md [Unreleased] ### Added gains the ContinuousDiD tracker-promotion bullet at the top with per-benchmark tolerance language calling out the relative-tolerance scope caveat (NOT bit-exact like HAD) due to the boundary-knots deviation precluding algorithmic bit-equality. - TODO.md gains one consolidated row tracking the three CGBS 2024 feature deferrals (covariates kwarg, discrete-treatment saturated regression, lowest-dose-as-control Remark 3.1) — these mirror R contdid v0.1.0's omissions and are explicitly marked deferred in the REGISTRY Implementation Checklist L755-757. R parity scope: 1% overall ATT on all 6 benchmarks; 1% max ATT(d) curve and 2% max ACRT(d) curve on benchmarks 1-3 via _compare_with_r helper; 1% overall ACRT on benchmarks 4-5; benchmark 6 is event-study ATT-only. NOT bit-exact (atol=1e-8) like HAD — boundary-knots divergence precludes algorithmic bit-equality on aggregated dose-response curves. 89 regression tests pass (80 unit + 9 methodology, R benchmarks deselected without R/contdid installed). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Release notes consolidate 8 PRs since 3.4.0 (2026-05-19): Public-surface variance lifts: - SpilloverDiD survey_design on HC1/CR1 via Binder TSL (Wave E.1, igerber#468) - SpilloverDiD vcov_type=conley + survey_design via stratified-Conley on PSU totals (Wave E.2, igerber#474) + lag_cutoff>0 follow-up (igerber#477) - SunAbraham vcov_type ∈ {classical, hc1, hc2, hc2_bm} (Phase 1b 1/8, igerber#472) - WLS-CR2 Bell-McCaffrey gates lifted via clubSandwich port (igerber#475) Methodology-review-tracker promotions (mostly docs/tests): - PreTrendsPower R pretrends parity goldens (PR-C, igerber#471) - HAD methodology-review-tracker promotion (igerber#473) - ContinuousDiD methodology-review-tracker promotion (igerber#476) All changes additive; bit-equal defaults preserved across the affected estimators. No new estimators (patch-level per semver convention). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Flip the ChaisemartinDHaultfoeuille (DCDH) row from In Progress to Complete. Adds the Verified Components / Test Coverage / Corrections Made / Deviations / Outstanding Concerns detail section mirroring the ContinuousDiD (PR igerber#476) and HAD (PR igerber#473) precedents. Consolidates 7 DCDH deviations from the paper, from R DIDmultiplegtDYN, and library extensions into a labeled REGISTRY surface per the AI-review "Documenting Deviations" convention. CHANGELOG [Unreleased] gains a new Added entry. L27 In Progress example re-pointed to WooldridgeDiD; L1289 priority-order queue item igerber#6 removed and items igerber#7-igerber#11 renumbered to igerber#6-igerber#10. No source code changes, no new tests, no new docstrings — documentation consolidation only. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Summary
Promotes the
HeterogeneousAdoptionDiD(HAD) row inMETHODOLOGY_REVIEW.mdfrom In Progress to Complete for the de Chaisemartin, Ciccia, D'Haultfœuille & Knau (2026) Weighted-Average-Slope estimator (arXiv:2405.04465v6).tests/test_methodology_had.py(6 classes, 36 tests) with paper-equation-numbered Verified Components walk-through covering Equations 3 / 7 / 11 / 18 / 29 and Theorems 1 / 3 / 4 / 7. Key fixtures: Eq. 3 boundary-subtracted recovery on both zero-boundary and nonzero-boundary-intercept DGPs (locking theatt = (mean(ΔY) - τ_bc) / mean(D)subtraction); Eq. 11 mass-point Wald-IV closed-form equivalence atatol=1e-9; Theorem 4 QUG distributional match against closed-formF(t)=t/(1+t)at KS-stat ≤ 0.05, n_draws=5000; Eq. 29 paper-literalσ²_diff=1/(2G)normalization lock; joint Stute H0 fail-to-reject on both pre-trends and homogeneity surfaces plus H1 reject for joint homogeneity under a nonlinear (D + D²) DGP; library-deviation locks (equal-weighting via selective low-dose region replication, sup-t bootstrap gating, staggered-timing fail-closedValueError, Assumption 5/6UserWarninglock,safe_inferencejoint-NaN invariant, last-cohort auto-filter viaresult.filter_info).HeterogeneousAdoptionDiDclass docstring + "Scope (what this test does NOT cover)" clauses toqug_test/stute_test/yatchew_hr_test/did_had_pretest_workflowNotes sections, explicitly stating QUG tests the support-infimum nulld_lower=0(adjacent evidence on one clause of Assumption 4 only);stute_test/yatchew_hr_testtarget Assumption 8 linearity;joint_pretrends_testtargets Assumption 7 mean-independence; none test Assumptions 5 or 6 directly. Reinforced by the existing fit-timeUserWarningon Design 1 family paths.fit()docstring's staggered-timing contract to explicitly document both branches:first_treat_colsupplied → auto-filter to last-cohort + never-treated withUserWarningper Appendix B.2; omitted on multi-cohort panel → fail-closedValueError. Cross-referenced REGISTRY Deviations § "Library extension: Staggered-timing fail-closed" for the rationale.atol=1e-8intests/test_did_had_parity.py(3 DGPs × 5 method combos, bit-exact viartol=0) is a strictly stronger correctness anchor than coverage-rate Monte Carlo. The paper itself self-acknowledges (Section 5.2) that NP estimators are too noisy on the LBD-restricted PNTR panel.covariates=Theorem 6 follow-up and the extensive-margin "consider running standard DiD" main-fit()warning explicitly tracked inTODO.mdas Low-priority follow-ups. Paper-review checklist L182-194 closes Phase 1a/1b/1c implementation-status items plus the Assumption 5/6 documentation closure; the extensive-margin item is left explicitly open (partial coverage).tests/test_had.py/test_had_pretests.py/test_had_mc.py/test_had_dual_knob_deprecation.pyremain unchanged; 5 R-direct parity tests intest_did_had_parity.pyatatol=1e-8are the documented R-parity anchor; nprobust port + bias-corrected port tests at machine precision (atol=1e-12 / 1e-14) cover Eq. 7 separately.Methodology references (required if estimator / math changes)
HeterogeneousAdoptionDiD,qug_test,stute_test,yatchew_hr_test,joint_pretrends_test,joint_homogeneity_test,did_had_pretest_workflow.docs/methodology/papers/dechaisemartin-2026-review.md. R reference:chaisemartin::did_had(Credible-Answers/did_hadv2.0.0, SHAedc09197);nprobustv0.5.0 (Calonico-Cattaneo-Farrell) for bandwidth selection.## HeterogeneousAdoptionDiD§ "Deviations and library extensions" (5 entries):_nprobust_port.lprobustdefault).aggregate="event_study" + (weights= or survey_design=) + cband=True(stability invariant; unweighted event-study bit-exactly preserves pre-Phase 4.5 B output).atol=1e-8is stronger; paper self-acknowledges NP estimators too noisy on LBD-restricted panel.ValueError(paper prescribes "Warn"; library raises — stricter-safety library extension to prevent silent misuse of the last-cohort-only identification).Validation
tests/test_methodology_had.py(6 classes, 36 tests, ~960 LoC). All 36 pass; full HAD test sweep (test_methodology_had.py+test_had.py+test_had_pretests.py+test_did_had_parity.py+ T20/T21/T22 drift tests) reports 665 passed, 2 skipped, 0 failures.Security / privacy
Generated with Claude Code